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1 Introduction 


Economic inequality arises due to the inequality in the distribution of income and assets 
among individuals or groups within a society or region or even between countries. Economic 
inequality is usually measured to evaluate the effects of economic policies at the micro or 
macro level. In the economics literature, there are several inequality indexes that measure 
the economic inequality. Among those indexes, Gini i nequality inde x is the most widely used 


measure. The most celebrated Gini index, as given in 


Arnoldl fj2005f) . is arnold2005inequality 


Gf{X) = —, where A = E iXi - X 2 I, /i = E(X) 

2/i 


( 1 ) 


and Xi & X2 are two i.i.d. copies of non-negative random variable X. Gini index 
compares every individual’s income with every other individual’s income. If there are n 
randomly selected individuals with incomes given by Ai,..., A„, then the estimator of the 
celebrated Gini index is 


A _ 

^ rt 


2A ’ 


( 2 ) 


where Xn is the sample mean and A„ is the sample Gini’s mean difference dehned as. 


Xyi ^ Xj^ and A„ 
n 


%=\ 


-1 




( 3 ) 


l<il <22<n 


The Gini index is undehned if A„ = 0. We ignore this special case. 

For continuous evaluation of different economic policies implemented by the government, 
computation of Gini index for the whole country or a region is very important. One source 
of income or expenditure data for all households in a region is census data which is typically 
collected every 10 years. As a result, Gini index computed based on census data is available 
only once in every 10 year fl .e ™ ._e da. . —. a. 

^For some countries, Gini indexes reported are based on even more than 15 years old data (see World 
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estimate of Gini index for intermediate periods between two censuses can not be obtained. 
Some developed countries conduct household surveys annuallja. However, many countries 
can not afford or do not conduct household survey annually. For those countries, it is useful 
to draw relatively small number of households to estimate the population Gini index. In 
order to estimate Gini index for a region, a simple random sampling of households may be 
useqj. For the existi ng literatu r e on t h e use of simpl e rand o m sampling to e stimate inequalit y 


indexes, we refer to 


GastwirthI ( 1972 ). 


Bishop et all ([1997 


Xu! ( 2007 ) and 


DavidsonI ( 20091) . 


In the economics literatur e, t 
intervals for Gp (for e.g., see IXu 


l ere ex ist innovative methods for constructing conhdence 


f|2007l) h However, we know that the conhdence interval 


varies from sample to sample and so is its width. Wider conhdence intervals provide less 
precise information about the true value of the parameter of interest. Since it is desirable 
to construct shorter conhdence intervals, we rather hx the length of the conhdence interval, 
or in other words, the margin of error while achieving the same conhdence coefficient. Thus 
we want to construct a 100(1 — a)% hxed-width conhdence interval for Gp. This problem is 
know as the hxed-width conhdence interval estimation problem. 


No hxed sample size procedure can prov ic 


interval estimation problem (e.g., see 


e a solution to the hxed-width conhdence 


Dantzid f|l940h h This problem falls in the domain of 


sequential analysis. For the details about the g eneral theo ry of hxed-width conhdence interval 


estima tion, we refer the interested readers to 


Sen! f 198ll) and 


Mukhopadhvav and De Silva 


fl2009l) . Sequential analysis is concerned with studies where sample sizes are not hxed in 


advance unlike hxed-sample size procedures. Instead, the sequential estimation procedure 
depends on collecting observations until an a-priori specihed criterion or stopping rule is 
satished. 


Bank website). Examples include Belize, Algeria, and Botswana whose Gini indexes are based on data 
collected in 1999, 1995, and 1994 respectively 

^For instance, European Statistics on Income and Living Conditions conducts a household survey 
that collects data from at least 273000 individuals from each country in the European Union (see 
http://epp.eurostat.ec.europa.eu/cache/ITY_SDDS/EN/ilc_esms.htm#data_rev) 

^For estimating Gini index for smaller countries such as San Marino, Monaco etc., one can conduct small 
scale surveys by simple random sampling of households. 
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We know that Gini’s mean difference is U-statistic with a symmetric kernel of degree 


2 and the sample mean is a U-statistic with a symmetric kern el o f degr ee 1 (for e.g., see 


Hoeffding, 


Hoeffdingi (1l948h ). Under distribntion-free scenario, 


XiJ ( 20071 ) used the central 


limit theorem for U-statistics to come up with a confidence interval for Gini index. However, 
this cannot be used to find out a fixed-width confidence interval for Gini index. In this article, 
we solve the problem of obtaining a fixed-width confidence interval for Gini index using a 
purely sequential procedure with a stopping rule based on several U-statistics. Apart from 
being unbiased estimators, U-statist ics a re als o reverse martingales with respect to some 


non-increasing filtration as proven in 


Leel fjl990[) . For more literature on reverse marti ngales 


we re 


(1196311 . 


e r to classical textbooks on probability theory and stochastic processes such as 


Loeve 


DoobI (119531) . and others. We exploit the reverse martingale property of U-statistics 


to derive attractive asymptotic properties of our proposed estimation procedure. 

In the next section, we formally state the fixed-width confidence interval estimation 
problem and why a fixed-sample size procedure cannot be used. In section 3, a purely 
sequential procedure is proposed to construct a 100(1 — a)% fixed-width confidence interval 
for unknown population Gini index and implementation and characteristics of the sequential 
procedure is discussed as well. Section 4 presents simulation study and validate all theoretical 
results related to our procedure. We conclude this article with some remarks in section 5. 


2 Problem Statement and Optimal Sample Size 

Consider n randomly selected individuals from some population of interest with incomes 
denoted by Xi,X 2 ,... ,Xn- These are nonnegative random variables. A strongly consistent 
estimator of population Gini index Gp is Gn given in ([2]). For fixed a G (0,1), the goal of this 
paper is to develop the theory for constructing a 100(1 — a)% fixed-width confidence interval 
for Gp. Formally, we would like to construct a confidence interval = (G„ — d,Gn + d) 
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such that 


P (^Gn — d < Gp < Gn + > 1 — a, 


for some prefixed margin of error d > 0. Using IXul (120071) . we have 


(^Gn — Gp^ N (O, as n —)■ cxo, 


where is the asymptotic variance given by, 


A2 2 Ar A2 af 

^ — 7 <^ -y H-y H 7 - 

iP jj? /i^ 


4/i4 


(4) 


(5) 


( 6 ) 


Here, 


r = U(Ai |Ai - A 2 I) and = U[U |Ai - A 2 I |Ai = xi]. 


Based on the asymptotic normality of G„, we observe that the coverage probability is 


P {Cri - d < Gp < Gn + d^ 



- 1 , 


where $ is the distribution function of standard normal random variable. In order to have 
100(1 — a)% conhdence interval, sample size n must satisfy 


2^ 



1 > 1-a. 


(7) 


Solving ([7]) for n, we obtain n > where is the upper (f)*^ quantile of the 

standard normal distribution. Thus, the optimal (minimal) sample size required to con¬ 
struct a fixed-width confidence interval for Gini index with approximately (1 — a) coverage 
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probability is 


+ 1 , 


( 8 ) 


provided ^ is known. 

The optimal fixed sample size C is unknown since the true value of ^ is unknown in 
practice. If C were known, one would just draw C observations independently from the 
population of interest and compute {Gc — d, Gq + d) which would satisfy dH) approximately. 
Since G is unknown, one must draw samples at least in two stages in order to achieve the 
desired coverage probability at least approximately. In the first stage, one must estimate 
G by estimating and then in the subsequent stages one should collect samples until the 
current sample size is more or equal to the estimated optimal sample size. In this article, we 
propose a sequential sampling procedure to estimate the optimal sample size G and ensure 
that the fixed-width confidence interval based on the final sample size attains the desired 
(1 — a) coverage probability. 


3 The Sequential Estimation Procedure 


In sequential estimation procedures, the parameter estimates are updated as the data is 
observed. In the first step, a small sample, called the pilot sample, is observed to gather 
preliminary information about the parameter of interest. Then, in each successive step, 
one or more additional observations are collected and the estimates of the parameters are 
updated. After each and every step a decision is taken whether to continue or to terminate 
the sampling process. This decision is based on a pre-defined stopping rule. 

From ([8]) we note that the optimal sample size needed to find a fixed-width confidence 
interval depends on unknown par ame t er So, let us first fin d a good estimator of the 


unknown parameter Following 


Xnl (120071 ) and 


Spronlel fjlOOOl ). we consider the following 


strongly consistent estimator of based on U-statistics. Let us define a U-statistic, for each 
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j = 1,2,..., n, 


Ti 


( 9 ) 


where Tj= {{ii,i2) ■ 1 < *i< n and ii,i27^ j}- Define W n ri = nA „ — (n — 2)An^ for j = 


l,...,n, and Wn = n According to 

estimator of Aa^ is 


Sprould (119691) . a strongly consistent 


-ir^^{W,n-Wr, 


Using 


Xd (1209711 ■ 


2=1 


Tn = 


n{n - 1) 2 

^ ^ in,2) 


^hx,,+x,,)\Xi^-x 


I 


( 10 ) 


is an estimator of r. Let be the sample variance. Thus, the estimator of is 


A2 Q2 a «2 

7/2 _ C" I I ^wn 

^ n _4 _*1 ' ^ ' 9 ' 2 ’ 

4x: x: x: 4x: 


(11) 


similar to 


Xd 11200711 . Using 


Sen! (1198111 . we conclude 


Sproulel (119691) and theorem 3.2.1 of 
that is a. strongly consistent estimator of Based on this estimator of we define the 
stopping rule N^, for every d > 0, as 


Nd is the smallest integer n{> m) such that n > 


-a/2 

~T 


(V?+ «-■)■ 


( 12 ) 


Here, m is called the initial or pilot sample size, and the term n~^ is known as a correction 
term. Note that 14, can be very close to zero with positive probability. Without the correction 
term, the inequality (IT^ may be satisfied for very small n terminating the sampling process 
too early. Thus the correction term n~^ ensures that the sampling process for estimating 


the opt imal 


refer to 


samp 


SenI dlMlI). 


e size does not stop too early. For details about the correction term, we 
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From (fT^ . we note that, Nd > i.e., the final sample size must be at least 

Zctj^/d. Therefore, we consider the pilot sample size to be m = max |4, Zr>/ 9 /d|. This 


techni que of estimating pilot sample size can also be found in 


(120091) 


Mukhopadhvav and De Silva 


Recall that the optimal sample size required to achieve 100(1 — a)% confidence interval 
for Gini index is C which is unknown in practice. The stopping variable Nd defined in flT^ 
serves as an estimator of C. Below, we develop a purely sequential procedure to estimate 
the optimal sample size C. 


3.1 Implementation and Characteristics 

We propose the following purely sequential estimation procedure to estimate C\ 

Stage 1: Compute the pilot sample size m = max {4, ZQ,/ 2 /d} and draw a random sample 
of size m from the population of interest. Based on this pilot sample of size m, obtain an 
estimate of by finding as given in ffTTl) and check whether m > {za/ 2 /dY {V^ + m~^). 
If m < {zai 2 ldY {y^ + m~^) then go to the next step. Otherwise, set the final sample size 
Nd = m. 

Stage 2: Draw an additional observation independent of the pilot sample and update the 
estimate of by computing Check if m + 1 > (^Q,/ 2 /d)^ (Kn+i + {fn + 1)“^). If 

m + 1 < {z^i2/df + (m + 1) then go to the next step. Otherwise, if m + 1 > 

(^a/ 2 /d)^ (Cm+l + (m + 1) then stop further sampling and report the final sample size as 
Nd = m + l. 

This process of collecting one observation in each stage after stage 1 is continued until 
there are Nd observations such that Nd > [Zai 2 ldy At this stage, we stop 

sampling and report the final sample size as Nd- 

Based on the above algorithm, the sampling process will stop at some stage. This is 
proved in Lemma 1 which states that if observations are collected using (El), under appro¬ 
priate conditions, P{Nd < 00 ) = 1. This is a very important property of any sequential 





procedure since it mathematically ensures that the sampling will be terminated eventually. 

Next, we establish some desirable asymptotic properties of our proposed sequential pro¬ 
cedure. First, we prove that the hnal sample size Nd required by our sampling strategy is 
close to the optimal sample size C at least asymptotically. This property is known as asymp¬ 
totic efficiency property of sequential procedure which ensures that, on average, we collect 
only the minimum number of samples to achieve certain accuracy of estimation. Second, we 
show that the hxed-width conhdence interval ~ d, contains the true value of 

Gini index Gp nearly with probability 1 — a. We formally state these results in theorems 1 
and 2. 

Theorem 1. If the parent distribution F is such that and E[X~h] exist for /9 > djf 

then the stopping rule in IflE) yields the following asymptotic optimality properties: 

(i) Nd/G""-^ 1 asdiO. 

(a) E (Nd/G) —1 as d 0. 

Theorem 2. If the parent distribution F is such that exist, then the stopping rule in 

[IB) yields 


P — d Gp -|- d'j 


—1 — a as d 0. 


(13) 


Theorems 1 and 2 are proved in the appendix. Part (i) of theorem 1 implies that the 
ratio of hnal sample size of our procedure and the optimal sample size, C asymptotically 
converges to 1. Part (ii) of theorem 1 implies that the ratio of the average hnal sample size 


of our procedure and C asymptotically converges to 1. This property is called hr s t orde r 


asymptotic efficiency property as it can be found in 


Mukhopadhvav and De Silval (120091) . 


Theorem 2 implies that the coverage probability produced by the hxed-width conhdence 


^If for a certain distribution function, negative moments doesn’t exist, then theorem 1 will hold, if 


E 


2 -^-2 

sup 

n>m 


E 


- ^-2 
sup A „ 

n>m 


and E 


_2 

sup S'2 A„ 

n>m 


are finite. 
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interval — d, + dj attains the desired level 1 — a asymptotically. This property is 

called asymptotic consistency. Thus, we prove that the proposed purely sequential procedure 
enjoys both asymptotic efficiency property and asymptotic consistency property. 


4 Simulation Study 


In this section, we validate the asymptotic properties of our method stated in theorems 1 
and 2 through Monte Carlo study. To implement the sequential procedure, we £x d{= 0.01) 
and a{= 0.1). Using the pilot sample size formula m = max {4, z^/d}, the pilot sample size 
considered here is 165. Then, we implement the sequential procedure described in section 
13.11 and estimate the average sample size {N), the maximum sample size (max(iV)), the 
standard error (s(iV)) of N, the coverage probability (p), and its standard error (sp) based on 
2000 replications by drawing random samples from gamma distribution (shape = 2.649,rate 
= 0.84), log-normal distribution (mean = 2.185, sd = 0.562), and Pareto (20000, 5). Table 
[1] summarizes the numerical results obtained from the simulation study. The pa rame ters of 


log-normal and gamma distributions are same as used by Ransom and Cramer (119831) . 


From the fourth column of table [H we hnd that the ratio of the average hnal sample size 
and C is close to 1. Moreover, column 6 of table [U illustrates that the attained coverage 
probability is very close to the desired level of 90%. Thus, we find that the simulation results 
validate all theoretical results mentioned in the previous section, and the performance of the 
procedure is satisfactory for the above mentioned distributions. 


5 Concluding Remarks 

Gini index is a widely used measure of economic inequality index. In order to evaluate the 
economic policies adopted by a government, it is important to estimate Gini index at any 
specihc time period. If the income data for all households in the region of interest is not 
available, one should estimate Gini index by drawing a simple random sample of house- 
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holds from that region. This article develops a purely sequential procedure that provides a 
100(1 — a)% hxed-width conhdence interval for Gini index. Without assuming any specihc 
distribution for the data, we show that the ratio of the hnal sample size and the optimal 
sample size approaches 1. We also show that the conhdence interval constructed using our 
proposed sequential method attains the required coverage probability. Thus, based on these 
results, we conclude that the proposed sequential estimation strategy can efficiently construct 
a 100(1 — a)% hxed-width conhdence interval for Gini index. In this article, we consider that 
after pilot sample, one additional observation is collected in each step. If instead, a group 
of r(> 1, say) observations are collected in each step after the pilot sample stage, the same 
properties will hold. The proofs will be similar to the ones in Appendix. 

Apart from economics, there are other helds where researchers report Gini index. For 


instance, in social 
in education (see 


sciences ana 


econo mics, the Gini index is used to measure inequality 


of biodiversity (for e.g., see 


Wittebolle et al 

(2009 

J 5 

))• 

Asada 

(2005) 


measu re of the inequality of health related quality of life in a population. 


Shi and Sethu 


(120031 ) uses Gini index to evaluate the fairness achieved by internet routers in scheduling 


packet transmissions from diherent hows of traffic. Possible application of Gini index in so 
many helds such as sociology, health science, ecology, engineering, and chemistry motivates 
us to develop the theory for constructing a hxed-width conhdence interval for Gini index. 


6 Appendix 

Lemma 1. Under the assumption that ^ < oo, for any d > 0, the stopping time is finite, 
that is, P{Nd < oo) = 1. 

Proof. The lemma 1 is proved by using (IT^ and the fact that Vf is strongly consistent 
estimator of and A^^ —)■ oo as d 0 almost surely. □ 

Lemma 2. The value of sample Gini index lies between 0 and 1. 
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Proof. Let be the ordered incomes of n persons where Yi represents the in- 

come of the poorest person and Yn represents the income of the richest person. Using 


Damgaard and Weiner! (120001 ) . Gini index can be rewritten as 


o<d = ^^ ^ 


n 




n 


nT.t.Y, 


< 1 . 


n 


n 


This proves the lemma. 


□ 


6.1 Proof of Theorem [T] 

In this subsection, we prove some lemmas that are essential to establish theorem 1 and 
theorem 2. First, we introduce a few notations. Note from (IT^ that Nd > he., Nd > 

m) with probability 1. Suppose X(n) = • • • )^(n)) denotes the n dimensional 

vector of order statistics fr om the sam ple Xi,, Xn, and Xn is the a-algebra generated by 


(-^(n); ^n+1) Xn+ 2 , ■ • •)• BylheJ p99o |), [X^^X^], {r„, |a„, and their 


convex functions are all reverse submartingales. Using reverse submartingale properties, let 
us prove the following lemmas. 

Lemma 3. Let Xn be the sample mean based on non-negative i.i.d. observations Xi,..., Xn. 
Then, if E{X~^) < oo, for s > r and r > 1, 


E (max =p I < oo. 

\n>m X 


(14) 


Proof. For a > 1, we have 


/ 1 \ /■“ / 1 
E I max < 1 + / P { max 

\n>m Jl \n>m 


>t]dt<l + 


E (XZ^ 


/S-1 


(15) 


where fi > 1. The last inequ ality is obtained by applying maximal inequality for reverse 


submartingales (see 


Leel (119901) 1. Let s = rfd. Now, it is enough to show that if E{X ®) < oo. 
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then E IX^ " ] < oo. Note that Xn > (nr= as the observations are nonnegative and 


E X„ ] <E 




n 

, 2=1 


X: 


= <E 


Xi 


s/n 


(16) 


The last eqnality is dne to the i.i.d. property of the observations. We know that {E (|X|p)}^^^ 
is a nondecreasing fnnction of p for p > 0. Applying this resnlt with p = l/n<lin ffT6|) . 
we complete the proof. □ 


Lemma 4. 


If E{Xf) and E{X^ exist for /? > 4, then 


E 


snpV^ 

n>m 


< oo for m > 4. 


Proof. To prove lemma 01 it is enongh to show that: E 


_2 

snp sl^X^ 

, E 

sup 


_n>m 


n>m 


A2 

snp^ 


, and E 




V 

n>m 'Ti 


are hnite. We note that, 0 < ^ < 1. So, it is enongh 

2J\. 


Ghosh (p. 338, 



-9 



to show that E 

sup sXX„ 

n>y7i - 

, E 

snp^ 


Sen and GhoshI (119811) h we have E 


and E 


si 

snp^ 

n>m^n 


are hnite. Following Sen and 


snp < oo if F'lX"] < oo for « > 4 

n>m 


and m > 4. By lemma 01 E 


snp X„ 

n>m 


< oo if E[X.^ ^] < oo for /I > 4. Therefore, 


E ( snp sl^X^ X < <^ E ( snp J E ( snp ^ 

\n>m / I \n>m / \n>m 


1/2 


< OO. 


(17) 


We note that and are U-statistics. Using lemma 9.2.4 of iGhosh et all fll997l) . 


E fsnp jr^l^ < AE (|r^|) and E ("snp E (|^^|) . 

\n>m / \n>m J J 


Applying Ganchy-Schwarz ineqnality. 


E I snp 

n>m 


Tn 




< <1 U ( snp |f„ 

\n>m 


E ( snp 

n>m 


x:^ 


< oo. 
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and 


E 


sup 

n>m 



< < E 


sup pj 

n>m 



sup 

n>m 



1 

2 


< OC, 


if E{Xf) and E{X^ exist for (3 > A. This completes the proof of lemma IH 
Below, we prove theorem [T] by using lemma [3] and 01 


□ 


(i) The dehnition of stopping rule in flT^ yields 

Ti-i + (A',,-!)”') ■ ( 18 ) 


Since ^ oo a.s. as d 0 and 14 —t ^ a.s. as n —)> oo, by theorem 2.1 of Gut (2009), 
a.s.. Hence, dividing all sides of (lT8l) by C and letting d 0, we prove Nd/C —)• 1 
a.s. as d 0. 

(a) Since Nd > m a..s., dividing flTSD by C yields 


Nd/C-mI{Nd = m)/C< 


e 


sup + (m 


d>0 


1 ) 


-1 


almost surely. 


(19) 


Since E ( supH^ < oo by lemma 0] and Nd/C 1 a.s. as d J, 0, by the dominated 
\d>0 J 

convergence theorem, we conclude that \im.E{Nd/C) = 1. This completes the proof of 
theorem [T] 


6.2 Proof of Theorem [2] 


In order to show that our procedure satishes the asymptotic consistency property, we will 
derive an Anscombe-type random central limit theorem for Gini index. This requires the 
existence of usual central limit theorem of Gini index and uniform contin uity in pro b abilit y 


(u.c.i.p.) cond i tion. For det ails a bout the u.c.i.p. condition, we refer t o 


Spronle 

(19691 

Isoerai 

fl9§b 

), and 


Mnkhopadhyav and Ghattopadhyavl (120121 ) etc 


Anpombd (119531) . 
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F irst of all, let us define ni 


from 


XiJ (120071) that = 


= (1 — p)C and 77.2 = (1 + p)C for 0 < p < 1. Now, we know 
n(A„ - A), 4 ^ 2 ( 0 , S), where 


f 4af 2 (t - pA) ^ 

Zj — 

^ 2(r - pA) cT^ ^ 

First, let us prove that Y4 ^ 2 ( 0 , S). Define D' = (oq Oi). Note that D'Y = D'Y^ + 
(D'Ytv^ — D'Yc). Thus, it is enough to show that (D'Y^r^ — D'Yc) 4 0 as d J, 0. We can 
write 


(D'Y^,, - D'Yc) =ao4iW(AAr - Ac) + - Xc) 

+ WNd/C-l)T>'Yc. 


( 20 ) 


Fix some e > 0 and note that 

P { |ao4iW(A^, - Ac) + ai ^/K{Xn, - ^c) I > 

< P I|ao\/iW(AAr^ — Ac) + ai\/Wd{Xj^^ — Xc) \ > e, \Nd — C\ < pC 
+ P[|A, - Cl > pC] 

<P\ max |v^|A„ - Acl > TTT^-I + P J max |4n|A„ - Ac| > 

\^ni<n<n2 Z|ao|J \^ni<n<n2 ^iQ-ll 

+ P[|A, - C| > pC] 


He re, A„ and X„ are both U-statistics which satisfy Anscombe’s u.c.i.p. condition (for e.g. 


see 


Sproulel fll969l) b Using u.c.i.p. condition and the fact that N^/C 4 1, we conclude that 


for given e > 0, there exist p > 0 and do > 0 such that 


P{\ao^/I^d{ANa - Xc) + al^/I^{XN^ - Xc) \ > e} < p for all d < do 


15 













This implies ao^/Nd{Al\^^ — Ac) + aiy/Nd{XN^ — Xc) —)• 0 a.s d I 0. Also, note that 
(y\/Nd/C — 1 j D'Yc A 0 as d 0 since Nd/C —?■ 1 almost surely and D'Yc A A"2(0, S). 
Thus, from (l20l) . we conclude (D'Yat^ — D'Yc) A 0, that is, Yjv^ A ^"2(0, S). Now, dehne 
G{u,v) = ^, if n 7^ 0. Using Taylor’s expansion, we can write 

\^Nd{G{ANa,XN^) - G(A,p)) = ^/Nd ^2fi -“ /^) + ) (21) 

where Rn^ = -“^{An^ - A){XNa - l^)/b‘^ + 4a(XAr^ - a = A + p(An^ - A), b = 

2fi + p{2Xnj^ — 2p), and p G (0,1). Rewriting fl2l]) in the vector-matrix form, we get 

^(G'(A^„Xa.J - G(A,/i)) = D'Y^v, + (22) 

where D' = Note that y/Nd{XM^ — p) converges in distribution to a normal 

distribution by Anscombe’s CLT and both ^A^r^ — A^ and (Aat^ — p) converges to 0 almost 
surely. This yields y/N^RNa A 0 as d | 0. Hence, \fWd{GNa — Gp) A^(0, DXD) as d | 0. 
This completes the proof of theorem [2l 

REFERENCES 

Anscombe, F. J. (1953). Sequential estimation. Journal of the Royal Statistical Society Series 
B 15: 1-29. 

Arnold, B. C. (2005). Inequality measures for multivariate distributions. Metron 63: 317-327. 

Asada, Y. (2005). Assessment of the health of Americans: the average health-related quality 
of life and its inequality across individuals and groups. Population Health Metrics 3: 7. 

Bishop, J. A., Formby, J. P., and Zheng, B. (1997). Statistical inference and the sen index 
of poverty. International Economic Review 38: 381-387. 


16 





Damgaard, C. and Weiner, J. (2000). Describing ineqnality in plant size or fecundity. Ecology 
81: 1139-1142. 

Dantzig, G. B. (1940). On the non-existence of tests of “student’s” hypothesis having power 
functions independent of a. The Annals of Mathematical Statistics 11: 186-192. 

Davidson, R. (2009). Reliable inference for the gini index. Journal of Econometrics 150: 
30-40. 

Doob, J. L. (1953). Stochastic processes. New York: Wiley. 

Gastwirth, J. L. (1972). The estimation of the Lorenz curve and gini index. The Review of 
Economics and Statistics 54: 306-316. 

Ghosh, M., Mukhopadhyay, N., and Sen, P. K. (1997). Sequential estimation. New York: 
Wiley. 

Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. The 
Annals of Mathematical Statistics 19: 293-325. 

Isogai, E. (1986). Asymptotic consistency of hxed-width sequential conhdence intervals for a 
multiple regression function. Annals of the Institute of Statistical Mathematics 38: 69-83. 

Lee, A. J. (1990). U-statistics: Theory and Practice. GRG Press. 

Loeve, M. (1963). Probability theory. Princeton, NJ: Van Nostrand. 

Mukhopadhyay, N. and Ghattopadhyay, B. (2012). A tribute to Frank Anscombe and random 
central limit theorem from 1952. Sequential Analysis 31: 265-277. 

Mukhopadhyay, N. and De Silva, B. M. (2009). Sequential methods and their applications. 
GRG Press. 

Ransom, M. R. and Gramer, J. S. (1983). Income distribution functions with disturbances. 
European Economic Review 22: 363-372. 


17 


Sen, P. K. (1981). Sequential nonparametrics: Invariance principles and statistical inference. 
New York: Wiley. 

Sen, P. K. and Ghosh, M. (1981). Sequential point estimation of estimable parameters based 
on u-statistics. Sankhyd: The Indian Journal of Statistics, Series A 43: 331-344. 

Shi, H. and Sethu, H. (2003). Greedy fair queuing: A goal-oriented strategy for fair real-time 
packet scheduling. Jfth IEEE Real-Time Systems Symposium 345-356. 

Sproule, R. (1969). A sequential hxed-width conhdence interval for the mean of a u-statistic. 
Ph. D. dissertation, Univ. of North Carolina. 

Thomas, V., Wang, Y., and Fan, X. (2001). Measuring education inequality: Gini coefficients 
of education 2525: World Bank Publications. 

Wittebolle, L., Marzorati, M., Clement, L., Balloi, A., Daffonchio, D., Heylen, K., De Vos, 
P., Verstraete, W., and Boon, N. (2009). Initial community evenness favors functionality 
under selective stress. Nature 458: 623-626. 

Xu, K. (2007). U-statistics and their asymptotic results for some inequality and poverty 
measures. Econometric Reviews 26: 567-577. 


18 


Table 1. Performance of the proposed sequential procedure when the data is from Gamma, 
Log-normal, and Pareto 








Distribution 

N 

C 

N/C 

max(A/) 

P 


s(N) 




Sp 

Gamma 

1259.492 

1267 

0.9941 

1594 

0.878 


4.3639 




0.0073 

Log-normal 

1429.349 

1424 

1.0038 

2391 

0.9015 


4.1393 




0.0067 

Pareto 

654.5364 

686 

0.9541 

1666 

0.9018 


4.2151 




0.0063 
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